Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts
نویسنده
چکیده
Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is critical to data retrieval from the document. In this paper I will demonstrate a method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms and will show that using ME modeling is a promising technique for abbreviation and acronym normalization. I report on the results of an experiment involving training a number of ME models used to normalize abbreviations and acronyms on a sample of 10,000 rheumatology notes with ~89% accuracy.
منابع مشابه
A Supervised Abbreviation Resolution System for Medical Text
We present our participation in Task 2 of the 2013 CLEFeHEALTH Challenge, whose goal was to determine the UMLS concept unique identifier (CUI), if available, of an abbreviation or acronym. We hypothesize that considering only the abbreviations of the training corpus could be sufficient to provide a strong baseline for this task. We therefore test how a fully supervised approach, which predicts ...
متن کاملA Maximum Entropy Approach to Semi-supervised Learning
Various supervised inference methods can be analyzed as convex duals of a generalized maximum entropy framework, where the goal is to find a distribution with maximum entropy subject to the moment matching constraints on the data. We extend this framework to semi-supervised learning using two approaches: 1) by incorporating unlabeled data into the data constraints and 2) by imposing similarity ...
متن کاملSemi-Supervised Learning via Generalized Maximum Entropy
Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms of constraints and/or penalties. We extend this framework to semi-supervised learning by incorpora...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملGraph Based Semi-Supervised Approach For Information Extraction
Classification techniques deploy supervised labeled instances to train classifiers for various classification problems. However labeled instances are limited, expensive, and time consuming to obtain, due to the need of experienced human annotators. Meanwhile large amount of unlabeled data is usually easy to obtain. Semi-supervised learning addresses the problem of utilizing unlabeled data along...
متن کامل